-
-
Notifications
You must be signed in to change notification settings - Fork 152
feat: reduce np.ndarray #1462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: reduce np.ndarray #1462
Conversation
| ) -> tuple[np.ndarray, np.ndarray]: ... | ||
| ) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a chance to make this more precise, like
tuple[np_1d_array[np.intp], np_1d_array[np.intp]]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no insight into this class / this class method, nor is it documented in any detail (personally I don't even know what "window bounds" means). It would be great if you suggest a few tests, or can we leave it in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure happy to leave it til later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the change suggested by @MarcoGorelli . See the docs here: https://pandas.pydata.org/docs/dev/user_guide/window.html#custom-window-rolling
The arrays have to be np_ndarray_int64
|
|
||
| class CategoricalIndex(ExtensionIndex[S1], accessor.PandasDelegate): | ||
| codes: np.ndarray = ... | ||
| codes: npt.NDArray[Any] = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np_1d_array[np.intp]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/pandas_nightly
| ) -> tuple[np.ndarray, np.ndarray]: ... | ||
| ) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no insight into this class / this class method, nor is it documented in any detail (personally I don't even know what "window bounds" means). It would be great if you suggest a few tests, or can we leave it in another PR?
|
|
||
| class CategoricalIndex(ExtensionIndex[S1], accessor.PandasDelegate): | ||
| codes: np.ndarray = ... | ||
| codes: npt.NDArray[Any] = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @property | ||
| def is_monotonic_increasing(self) -> bool: ... | ||
| def clear_mapping(self) -> None: ... | ||
| class IntervalTree(IntervalMixin): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the above comment is right, this can probably just be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class IntervalTree is something implemented in pandas with Cython. I don't understand what's happening with Ctyhon, but would like to leave its presence in the stubs, even though it's empty. Please tell me if you have a good argument to remove it completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a documented class, so we should remove it from the stubs. And then change the references in _MidDescriptor and _LengthDescriptor to be IntervalMixin
pandas-stubs/core/base.pyi
Outdated
| | np_ndarray_float | ||
| | np_ndarray_complex | ||
| | dict[str, np.ndarray] | ||
| | dict[str, np_1darray[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the general pattern you seem to be following is:
- for function arguments, don't be restrictive on shape
- for return types, be as precise as possible
Given that NumListLike is often used as argument, would this one be better as Any shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for function arguments, don't be restrictive on shape
This is relevant until the end of python 3.10 because it can only use an old version of numpy, which does not give the correct shape upon construction, i.e. np.array([1, 2, 3]) does not give a 1-d array with that old version, in static typing.
would this one be better as Any shape?
Yep dc85974
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/pandas_nightly
| @property | ||
| def is_monotonic_increasing(self) -> bool: ... | ||
| def clear_mapping(self) -> None: ... | ||
| class IntervalTree(IntervalMixin): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class IntervalTree is something implemented in pandas with Cython. I don't understand what's happening with Ctyhon, but would like to leave its presence in the stubs, even though it's empty. Please tell me if you have a good argument to remove it completely.
pandas-stubs/core/base.pyi
Outdated
| | np_ndarray_float | ||
| | np_ndarray_complex | ||
| | dict[str, np.ndarray] | ||
| | dict[str, np_1darray[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for function arguments, don't be restrictive on shape
This is relevant until the end of python 3.10 because it can only use an old version of numpy, which does not give the correct shape upon construction, i.e. np.array([1, 2, 3]) does not give a 1-d array with that old version, in static typing.
would this one be better as Any shape?
Yep dc85974
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few main issues:
- I think we should be consistent in the stubs in using the
np_ndarray_xxxtypes that are now in_typing.pyi, rather thannpt.NDArray[np.integer], etc. I think it's better to use those types that we have in_typing.pyiso we can be consistent througout the stubs. pd.unique(someIndex)andIndex.unique()are a bit of a mess, because the behavior changes from 2.3 to 3.0. See the release notes: https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html#other . So what we need to do here is return thenumpytypes as is done in 2.3, and put in a comment to switch to theIndextypes when 3.0 is released.- I don't think we should use
npt.NDArray[np.double]in the stubs. We always usenp.floating - There are places where we can be more precise on what are the allowed numpy array types as arguments.
I didn't necessarily pick up everything related to the main issues, and some of my comments are related to the 3 issues above.
| @property | ||
| def is_monotonic_increasing(self) -> bool: ... | ||
| def clear_mapping(self) -> None: ... | ||
| class IntervalTree(IntervalMixin): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a documented class, so we should remove it from the stubs. And then change the references in _MidDescriptor and _LengthDescriptor to be IntervalMixin
| def __iter__(self): ... | ||
| @property | ||
| def asi8(self) -> np.ndarray: ... | ||
| def asi8(self) -> np_1darray[Any]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result of this will be np_ndarray_int64
| ignore_na: bool = ..., | ||
| axis: Axis = ..., | ||
| times: str | np.ndarray | Series | np.timedelta64 | None = ..., | ||
| times: str | npt.NDArray[Any] | Series | np.timedelta64 | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| times: str | npt.NDArray[Any] | Series | np.timedelta64 | None = ..., | |
| times: str | np_ndarray_dt | Series | np.timedelta64 | None = ..., |
has to be datetimes
| @cache_readonly | ||
| def groups(self) -> PrettyDict[Hashable, Index]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you added this. I think we can delete the class Grouping, because it is only used by the class BaseGrouper, which is only used by the class BinGrouper, which is never referenced.
None of those are documented.
| ) -> tuple[np.ndarray, np.ndarray]: ... | ||
| ) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the change suggested by @MarcoGorelli . See the docs here: https://pandas.pydata.org/docs/dev/user_guide/window.html#custom-window-rolling
The arrays have to be np_ndarray_int64
| ignore_na: _bool = False, | ||
| axis: Axis = 0, | ||
| times: np.ndarray | Series | None = None, | ||
| times: npt.NDArray[Any] | Series | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| times: npt.NDArray[Any] | Series | None = None, | |
| times: np_ndarray_dt | Series | None = None, |
| ) | ||
| check( | ||
| assert_type(pd.unique(pd.RangeIndex(0, 10)), np.ndarray), | ||
| assert_type(pd.unique(pd.RangeIndex(0, 10)), np_1darray[Any] | pd.Index), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this can be a numpy array
| assert_type( | ||
| pd.unique(pd.timedelta_range(start="1 day", periods=4)), | ||
| np.ndarray, | ||
| np_1darray[Any] | pd.Index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UGH - looks like the type depends on the version of pandas.
So what we should do is in the stubs, use the numpy array as the return type. Add a comment that it will switch to the Index type with 3.0, and we'll make the change then
I think this is true for many of the pd.unique() overloads.
| assert_type(pd.unique(dti), np_1darray[np.datetime64] | pd.DatetimeIndex), | ||
| np_1darray if PD_LTE_23 else pd.DatetimeIndex, | ||
| np.datetime64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here with the change from 2.3 to 3.0. So stubs should return the 2.3 types, and we'll change it when 3.0 is released.
| check(assert_type(d1, npt.NDArray), np.ndarray) | ||
| check(assert_type(e0, npt.NDArray[np.intp]), np.ndarray) | ||
| check(assert_type(e1, npt.NDArray), np.ndarray) | ||
| check(assert_type(d1, np_1darray[np.double]), np_1darray[np.double]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we should use np_1darray[np.double] in our stubs. Just use np_ndarray_float
same in tests below
npt.NDArrayto benpt.NDArray[Any]